Morphologically and Syntactically Annotated Corpora of Many Languages
ثبت نشده
چکیده
Annotated corpora have become a standard resource for research in both linguistics and computational processing of natural languages. Lexicographers judge word usage and distribution by occurrences in corpora; part-of-speech tags may help them narrow their queries. Grammarians may use syntactically annotated corpora (treebanks) for queries such as “show me all examples where a verb governs two objects in the accusative.” In natural language processing (NLP), syntactic parsing is an important preparatory step for many tasks such as question answering, data mining or machine translation; the state-of-the-art parsers rely on human-annotated treebanks and apply machine learning algorithms to extract linguistic knowledge from the treebanks.
منابع مشابه
THE PRAGUE DEPENDENCY TREEBANK A Three-Level Annotation Scenario
The availability of annotated data (with as rich and “deep” annotation as possible) is desirable in any new developments. Textual data are being used for so-called training phase of various empirical methods solving various problems in the field of computational linguistics. While there are many methods that use texts in their plain (or raw) form (in most cases for so-called unsupervised traini...
متن کاملA Cross-language Approach to Rapid Creation of New Morpho-syntactically Annotated Resources
We take a novel approach to rapid, low-cost development of morpho-syntactically annotated resources without using parallel corpora or bilingual lexicons. The overall research question is how to exploit language resources and properties to facilitate and automate the creation of morphologically annotated corpora for new languages. This portability issue is especially relevant to minority languag...
متن کاملAutomatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora
We present a method for automatically learning inflectional classes and associated lemmas from morphologically annotated corpora. The method consists of a core languageindependent algorithm, which can be optimized for specific languages. The method is demonstrated on Egyptian Arabic and German, two morphologically rich languages. Our best method for Egyptian Arabic provides an error reduction o...
متن کاملDeveloping Morphologically Annotated Corpora for Minority Languages of Russia
Despite recent progress in developing annotated corpora for minority languages of Russia, still only about a dozen out of about 100 have comprehensive corpora, and even less have computational tools such as machine translation systems or speech recognition modules. However, given that many of them have resources such as dictionaries and grammars, the situation can be improved at relatively low ...
متن کاملMulti-Level Analysis and Annotation of Arabic Corpora for Text-to-Sign Language MT
The Arabic language is morphologically rich and syntactically complex with many differences from European languages, and this creates a challenge when porting existing annotation tools to Arabic. In this paper, we present an ongoing effort in lexical semantic analysis and annotation of Modern Standard Arabic (MSA) text, a semi automatic annotation tool concerned with the morphologic, syntactic,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014